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SUMMARY 


The objective of this study is to establish the 
feasibility, cost, and benefits of software reliability 
measurement in a specific environment, and to formulate ^rom 
this study recommendations for more general applications. The 
ultimate goal of the entire program is the determination of 
software failure rate parameters analogous to hardware failiire 
rate or wear-out rate parameters. 

During the first phase of this effort, data collected 
on categories of errors encountered during a software 
development was analyzed to determine if consistent measures 
could be derived to use as valid indicators of reliability. 

The failure ratio (number of failed runs F observed in N total 
runs in a given elapsed time) and the failure rate (number of 
failed runs F observed in t seconds of CPU time in a given 
elapsed calendar time) demonstrated qualification as such 
measures. 

The principal effort since the last report has been to 
apply rigorous statistical analyses and non-parametr ic methods 
to the existing data base. Linear and non-linear orthogonal 
polynomial regression analyses confirmed the validity of the 
failure rate and ratio as potential measures of reliability. A 
high positive statistical correlation was shown between failure 
severity and error category, failure severity and error count. 
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and error category with error count. In addition, a 
preliminary investigation into reliability forecasting showed 
the ensemble averages of both the failure rate and ratio are 
stationary and statistically significant. 

The failure rate and ratio measures appear to remain 
valid indicators when subjected to the parametric and 
non-parametr ic analyses described in this report. The 
preliminary attempt at forecasting was statistically valid but, 
this, of course, needs to be validated by real-world 
observ itions. Problems encountered in data collection due to 
lack of direct control over the process highlighted the need 
for formalizing this critical portion of any future effort ^ven 
if the cost increases. Operational avionics systems should 
provide a superior source of failure data since the recording 
of such' information is routinely performed as a part of 
aircraft maintenance by personnel other than the software staff. 
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1. INTRODUCTION 


This report sutranarizes work performed at The Aerosnaoe 
Corporation on a software reliability measurement study c th 
Langley Research Center, National Aeronautics and Space 
Administration, under Contract NASl-14392. The specific 
objective of the study is to establish the feasibility, cost, 
and benefits of such measurement in a specific environment/ and 
to formulate from this study recommendations for more general 
applications of software reliability measurement directed 
towards the goal of the determining of software failure rate 
parameters analogous to hardware failure rate parameters. A 
collateral objective is the identification of any other factors 
possibly contributing to software reliability that might be 
observed during the course of the data collection and analysis 
effort. 

This study was initiated in April, 1976. The work 
accomplished between April, 1976 and April, 1977 was reported 
in September, 1977 in NASA Contractor Report 14J205“. This 
report covers the work performed between April, 1977 and June, 
1978. 

Data analyzed in this study came from two sources: 

(1) Project ASTROS (Advanced Systematic Techniques for Reliable 
Operational Software)^, a joint effort of the Space and 
Missile Test Center (3AMTEC) and the Rome Air Development 
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Center (RADC) , both organizations within the Air Force Systems 
Command; and (2) the NASA Viking Program at the Jet Propulsion 
Laboratory. 

For the purpose of this study we have defined reliable 
software as follows: 

It is software that is correct (capable of execution 
and yielding correct results) and that meets other 
user requirements such as timing and interfacing with 
the environment. 

This concept is consistent with an earlier statement, "Software 
possesses reliability to the extent that it can be expected to 
perform its intended functions satisfactorily."^ There is 
justifiable concern about attempting to base measurement on 
"intended functions", but more restrictive formulations tend to 
prevent recognition of reliability problems arising from poorly 
drawn specifications. A need exists to evaluate software 
reliability against formally specified, as well as against more 
loosely defined or implied requirements. 

For reliability measurement, the software is operated 
over a period of time; segments of the operation are scored as 
failure or success by the qualitative criteria cited above; 
and, from these scores, an indicator of measured Lcliability is 
generated. 
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The principal indicators derived from he data are the 
failure ratio and the failure rate. The failure ratio, U, is 
defined as 


U = F/N (1) 

where F is the number of failures observed in N runs in a given 
calendar period, usually one month. The failure rate, u, is 
defined as 


u = f/t (2) 

where f is the number of failures observed during the total 
time, t seconds, accumulated over a given calendar period, 
again usually one month. These failure !r.etrics, and 
particularly their complement, the reliability metrics, 

R - 1 - 0 = S/N (3) 

where S stands for the number of successes and 

MTBF = t/f (4) 

are analogous to commonly used hardware reliability 
expressions. The relation of these metrics to those used by 
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other researchers in software reliability is described 
2 

elsewhere . 

The failure ratio and the failure rate are obtainable 
from records usually maintained in the development of critical 
software; they are consistent in time and among modules for the 
specific program studied; and they are potentially useful for 
management and research purposes. 

The use of the failure ratio, i.e., the ratio of 
failed runs to total runs in a given period of time, as a 
measure of software reliability is one of the innovations 
introduced in this study. Previous investigators had simply 
reported the number of failures per calendar interval. To the 
extent that the number of runs per month (or other interval) is 
not uniform, these measures will yield different results. For 
most purposes, the measure that will be preferred is the one 
that has the smallest variability. In the earlier report on 
this study it was shown that the failure ratio affords a more 
stable measure of reliability. 

In the course of the study it was observed that many 
runs ended in failure due to improper data setups, job control 
cards, or other factors not directly associated with the code 
developed. By counting as failures only those runs in which 
the cause of the failure resided in the program proper, we 
generated the program failure ratio. 

Both the total failure ratio and the program failure 
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ratio exhibit a general trend with time. By the use of 
regression, trend lines can be generated for the development 
period and/or for the most recent intervals to provide 
indicators of progress or lack of progress. The generation and 
use of these trend lines is discussed in the previous 
report. ^ 

The principal effort since the last report has been to 
verify the validity of these measures by more rigorous 
statistical analyses and to determine if meaningful 
correlations could be observed between variables existing in 
the data base. Linear and non-linear orthogonal polynomial 
regression analyses corroborate the effective use of the 
failure rate and ratio as measures of reliability. A high 
positive correlation was shown between failure severity and 
error category, failure severity and error count, and error 
category with error count. In addition, a preliminary 
investigation into reliability forecasting showed that the 
ensemble averages of both the failure rate and ratio are 
stationary and the confidence limits were defined. 

The failure rate and ratio measures appeared to remain 
valid indicators when subjected to the parametric and 
non-parametr ic analyses described in this report. The methods 
for analyses that were developed may be generalized to a broad 
class of problems; however, the specific results should only be 
generalized to comparable data bases. 
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Problems encountered in data collection due to lack of direct 
control over the process highlight the need for formalizing 
this critical portion of any future effort. Operational 
avionics systems should provide a superior source of failure 
data since the recording of such information is routinely 
performed as a part of aircraft maintenance by personnel other 
than the software staff. 

During this study, a search of the literature for 
generalized models of software reliability was conducted. The 
bibliography resulting from this search is contained in 
Appendix D. 
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2 . BACKGROUND 


As noted earlier the data for this study came from two 
sources: (1) Project ASTROS at the Space and Missile Test 
Center; and (2) NASA's Viking program at the Jet Propulsion 
Laboratory. These data bases are briefly described in the 
following section^. 

2.1 Project ASTROS Data 

The ASTROS data that was analyzed in this report 
was collected during the development of the Launch Support Data 
Base (LSDB) , a portion of the Metric Integrated Processing 
System (MIPS). MIPS provides the primary metric (i.e., 
positional) data processing for test or trajectory measurement 
activities on missiles, aircraft, and satellites. MI..S 
includes control, real-time, and non-real-tim*> egments. LSDB 
is a non-real-time segment that includes data management 
functions, coordinate transformations, and other scientific 
calculations supporting track veneration from multiple 
sources. It is run prior to launch operations without 
real-time constraints. 

The design of LSDB was started in September, 1975. 

The software failure data was collected during development of 
the LSDB from the initial coding through the in-house test 
phases prior to acceptance by the government. During the 
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development, the number of lines of code continually increased 
as runs were being made, and the effect of these changes on the 
reliability measurements is discussed later in this report. 
Di-ing program development there was no unusual pressure to 
cx itrol reliability for current runs, but there was adherence 
tj normal standards for reliable software. 

LSDB was developed as part of a demonstration program 
or structured programming techniques. Personnel appeared to be 
au tivated by their participation in such a demonstration, and 
tlie data collection efforts and management attention may have 
constituted confounding human factors that affected both the 
data and the measurements. 

The MIPS system specification required a modular 
program structure, hierarchical program design, and execution 
ordered programming. In addition to these overall 
requirements, the decision was made to create a highly 
disciplin#- programming environment for portions of the 
non-real-time segment that include the LSDB. This environment 
included the following: 

a. top-down development 

b. structured code 

. program support library 

d. chief programmer teams 

e. strt'ccured walk-throughs. 
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The data accumulated for the evaluation phase provided 
a unique opportunity to conduct software reliability 
measurements during program development. 

The LSdb program is composed of five major components 
(here referred to as "modules") consisting of approximately 40 
independent subroutines (referred to as "utilities") . The 
modules, linked with controls, are illustrated in Figure 
2-1. The entire LSDB Program comprises approximately 25,000 
lines of FORTRAN source code, of which the modules account for 
about 18,000 lines. Of the total, approximately 40 percent of 
the module code consists of comments. Most of the LSDB code 
was written in structured FORTRAN, translated into ANSI FORTRAN 
by means of the S-FORTRAN precompiler, and then compiled on an 
IBM 360/65 computer. Small segments were written in the IBM 
assembly language (BAL) . Originally, five programmers were 
assigned to LSDB. After a few months, the participation was 
reduced to a staff of three plus a programmer-librarian. 

SAMTEC Data Documentation 

lor every run made on LSDB, a run analysis report form 
was completed that listed the date, the module name, CPU time 
for the run, and coded information on the number of changes and 
run steps as shown in Appendix B. The run was scored as a 
success or failure by the development group. If a run was 
identified as a failure, additional information, contained in 
the failure analysis report, was provided identifying the type 
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and cause of failure. This form was also prepared by the 
program development personnel. This form is the second exhibit 
in Appendix B. 

It was not known a priori what factors in the 
programming and computer system environment might affect 
software reliability. For that reason, the stipulated 
requirements for the software product (here LSDB) as well as a 
description of the general environment was included as part of 
the record of this Software Reliability Measurement Study. 

Forms for reporting this background information are reproduced 
in Appendix B. The primary use intended for this information 
is for future comparative evaluation of the reliability 
measurements on LSDB with those from other sources. It is 
hoped that analytic information about the effects of 
programming, test, and management techniques can be gained from 
such comparisons. 

Data were received from SAMTEC through June of 1977 
when the developing contractor's contractual obligation to 
collect the data ended. 

2.2 Viking Data 

In order to establish an additional source of data, 
the cooperation of the Jet Propulsion Laboratory staff 
responsible for Viking ground data processing was solicited and 
received. This system was fully operational with limited 
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development effort to correct errors and to make enhancements. 
Data were received from April, 1977 through September , 1977 in 
the form of status reports and IBM computer operating system 
tapes. The June tap<t was unreadable and the September tape was 
not received. 

No source of data equivalent to the SAMTEC Run 
Analysis form was available from JPL. However, it was possible 
from the information contained in the error discrepancy 
reporting system (VISA'S) to determine which errors were 
actually software-caused and to perform some failure rate and 
ratio calculations. 
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3. DESCRIPTIVE AND COMPARATIVE ANALYSES OF LSDB PROGRAM 


MODULES 

The raw data collected during the LSDB development at 
SAMTEC was examined during this phase of the study to determine 
if other measures than failure rate or ratio could be derived. 
The analyses were done to provide insight into the detailed 
analyses that might be possible, or that should be performed. 
Variables such as the number of runs by module, types of runs, 
number of statement changes, number of lines of code, types of 
errors and types of statements such as assignment, logic and 
control were computed and compared. The results are given in 
the following paragraphs. 

3.1 Project ASTROS Data 

The total number of available records of runs 
available from Project ASTROS is 2,718. The sample selected is 
2,700 (1,389 for 1976 and 1310 for 1977) of which 514 were 
unsuccessful. With the exception of 41 runs, all efforts 
indicated on the forms were in the category of program 
development. The distribution of program activities in the 
2700 run sample is given in Figure 3-1 and indicates a dominant 
mode of compile and run. The distribution of the number of 
statement changes is given in Figure 3-2. The severity of 
failure in 490 cases was local job failure only; four other 
cases were reported as miscellaneous and one was reported as 
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real time. The error category distribution is given in Figure 
3-3; the dominant modes were logic errors (97) and operation 
errors (115) . Single errors were detected in 419 of the 
failures; however, this measure is questionable for the actual 
number of errors since the sequence of detection of errors in 
sequential runs is unknown. 

The distribution of the number of runs by Module is 
given in Figure 3-4. The BDP was the least used Module (185 
runs); the LSO was the most used (492 runs). During 1976, the 
LDG Module had the longest runs (CPU=300 sec.); all modules, 
except LSD, had at least one run of 199 sec. CPU time. The 
1977 pattern of module use showed an increase for BDT, LSD and 
LSO. LSD showed the longest run of 312 CPU. The distribution 
of the percent of total CPU time by module is given in Figure 
3-5. 

The percent of successful runs by module is tabulated 
in Table 3-1. The average success rate for all modules 
improved from 77.1% in 1976 to 85.0% in 1977. 

The source code for the entire Metric Integrated 
Processing System (MIPS) was obtained from the contractor and a 
SNOBOL program (see Appendix A) was written to categorize the 
LSDB program in terms of statement type per module. This was 
done to assess the correspondence between error rates and 
program complexity as reflected by statement type. The 
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Figure 3-2. Number of Statement Changes in 2700 Run Sample 
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Figure 3-3. Types of Errors Encountered in 2700 Run Sample 
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Table 3-2. LSDB Statement Types 
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assumption was made that invocatii^'n of an external routine 
(subroutine call) , logical decision and branching, and looping 
were statements of greater complexity than assignment. The 
distribution of statement types between the various modules of 
LSDB is tabulated in Table 3-2. 

The results of the tabular analysis are shown in Table 
3.2. The results indicate no clear pattern or relationship 
between variables or statemenc u^pe, use and failure ratio. 

3 Viking Data 

The Viking data exhibited failure characteristics that 
are similar to the ASTROS data in a number of ways. For 
example. Table 3-3 shows that the source of failure could be 
attributed to the program in only 28% of the total. During the 
final month of data acquisition, the program errors constituted 
only 16% of the total. Most of the error sources were not 
explicitly identified. 

The data for the monthly distribution of CPU time, 
number of runs, failure ratio and failure rate are given in 
Table 3-4, There was no apparent significant decrease in the 
recorded failure ratio or failure rate prior to the fourth 
month for data acquisition. However, at the end of the six 
month interval, both the failure rate and failure ratio were 
reduced to approximately one-half the beginning levels. Figure 
3-6 is a plot of the failure ratio for software only as well as 
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VIKING FAILURES 
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the composite of all VISAs. The number of data points does not 
provide an adequate date base for more detailed analyses. The 
results do indicate a possible trend in which the failure ratio 
for software alone declines at a lower rate than the 
composite. The total number of recorded program failure did 
not change significantly during the last four months of data 
acquisition; however, a significant increase in the failure 
incidence in some part of the system caused an increase to a 
level greater than the third month. The data are adequate to 
permit interpretation of this change. The trend, prior to that 
time, indicated that the total program was approaching a 
limiting level that would be asymptotic to the program failure 
rate. 

The failure ratio and failure rate for the operating 
system are recorded in Table 3-5. Both improved by an 
approximately factor of three over the test interval. The 
final ratio was 0.01. 
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Table 3-5. Viking Operating System Recorded Failures 
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Figure 3-6, Viking Failure Ratio 
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3.3 Effects of Schedules 

Scheduled reviews have an apparent effect on the rate 
of failure. Figure 3-7 reveals the effects of schedulding of 
LSDB activities as reported by VAFB, The notation (n) 
refers the reader to a point on Figure 3-7. 

1) The high points: In-house testing of the module 

LDI started in early 1976 (1) . In April the 

testing was reduced in order to reevaluate the 
testing. In April to May period the testing waa 
resumed. (3) represents the final testing of 
LDI and the testing of LDG. (5) represents the 
testing of modules BDP, BDT, BID, LSD, and (7) 
represents the testing of LSO. 

The low points: (2) was a period in which the 

documentation for the PDR (Preliminary Design 
Review) was produced. Points (4) Sept 1 and 
(6) December 15 were the times of the first and 
second CDR. (Critical Design Reviews). 

Reviewing the above data it is clear that, at least in the 
gross sense, the number (ratio) of failures occurring in a 
module vs time is strongly a function of managerial action. 
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Telling the team what to test and when to test it influences 
the maxima and minima values of the curve. However, the 
magnitude of the maxima is a dependent function of the number 
of errors in the code (although how many are discovered is 
again a function of the testing procedure) . 
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4. FURTHER ANALYSES ON THE EXISTING DATA BASE 


During the first phase of this study, failure rate and 
ratio neasur rments were plotted and simple linear regression 
analyses performed.^ To gain the maximum benefit from the 
data collection process, it was decided to subject the existing 
data base to more rigorous analyses both to verify the validity 
of posv^ible measurements of reliability and to determine 
meaningful data that should be collected for future studies. 

Since the data were acquired temporally, general time 
series analyses art possible for most parameters. The methods 
include linear and non-linear regression on time, 
autocorrelation, limited spectral analyses and stochastic 
forecasting. The results of the latter three methods are 
deferred to the next section. 

Where the measurements were not adequate for 
parametric analyses, non-parametr ic analyses were performed. 
These methods include tests for normality, qoodness-of-f it to 
theoretical distributions (deterministic models), correlation 
of variables or parameters, and tests for similarity and 
difference of variables. 

An additional and important method for analysis is the 
between module comparison for internal validity of 
characterization and homogeneity. The comparison for external 

, 3 

validity was reported previously. 
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4.1 Regression Analysis 

Nonlinear regression analyses were perfonaed on 
the failure rate and failure ratio measured oarameters. The 
method used, in most cases, was orthogonal polynomial 
regression. This method is somewhat more complex than simple 
least squares regression. However, the precision vers-.s 
complexity trade off of parameter estimates, and adaptability 
for assessing improvement achieved by adding coefficients for 
higher order terms justify the complexity. The method involves 
the computation of a set of coefficients for each data point 
and remapping of the orthogonal polynomials back into a 
fundamental regression equation. 

4.1.1 Composite Module Regression 

Nonlinear (second order) regression was applied so as 
to observe the asymptotic behavior of the data. The results of 
the regression for composite modules over 16 months are given 
in Figures 4-1 and 4-2 for the rate and ratio respectively. 

The results normalized by the number of statements are given in 
Figures 4-3 and 4-4. The average failure rate decreased from 
an initial value of approximately 1% to a value of 
approximately 0.1% at the end of the observation of program 
development. The failure ratio of failed to total runs dropped 
from 15% to 5% (approximately) during the observed development 
interval. 
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As expected, the second order composite regression, 
tor data stratified by week, produces the same general range as 
linear regression for failure rate and failure ratio estimates 
at the extremes. However, the more accurate fit reveals that 
the trend is toward an increase in failure rate and failure 
ratio from the initial value and a subsequent decrease with 
time. The regression demonstrated by the normalization of 
statement changes is shown in Figure 4-5. 

4.1.2 Module Comparison 

Comparisons of the trend in failure rate and failure 
ratio normalized by the number of statement changes shown in 
Figures 4-6 through 4-10. Consistency in the decrease of the 
failure rate and failure ratio of all modules both at the 
beginning and at the end of the observation period was observed. 

It may be seen that the general forms of the 
regression curves are reverted J's, inverted U's, with some of 
the inverted J's having no significant up-turn. 
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Figure 4-1. Composite Failure P.ate 
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Figure 4-2. Composite Failure Ratio 
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Figure 4-3, Failure Rate Normalized by Number of Statements 
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Figuro 4-7. LDG Failure Rat'' Normalized by Nun\ber of Changes 
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This character istic, of course, is the equivalent 
quadratic form for the first three terras of the negative 
exponential given by 


e"^ = 1 - X + + €{X) 

21 

for X i 1 and 


€ (X) < X^ 

3! 


4.2 Non-oarametr ic Analyses 

Before any statistical tests are performed, the data 
must be examined for level of measurement and distribution. 

For data having measurement precision sufficient for parametric 
tests, the sample distributions of undetermined form must be 
checked to determine if there is sufficient goodness-of-f it to 
established theoretical distributions. For validity, such 
tests require that underlying assumptions be met. Failing 
either criteria, the data must be analyzed using non-parametr ic 
techniques. Transformations are legitimate only if the data 
can be transformed and the inverse transform of the results can 
be ma >ed back into the original domain of the data for 
consistent interpretation. 
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Tests for goodness-of-f it to a normal distribution 
were made on the number of statement changes, CPU time and 
failure severity. The results of the Kolmolgorov-Smirnov tests 
are given in Table 4-1 for the successful runs and Table 4-2 
for the runs in which program errors were detected. The 
results indicate that none v>f these variates can be assumed to 
have come from a normally distributed population with any 
reasonable confidence. 

A test for goodness-of-f it to a Poisson distribution 
was made on the CPU time by runs distribution . The results 
indicated the probability of the sample distribution having 
come from a Poisson distributed population v/as less than 
0.00001. The same results were observed for success/failure of 
runs. Tnerefore, Poisson models appear to be inappropriate 
models for the variables. 

Tests were also made to determine the probability that 
the number of reported statement changes for successful and 
unsuccessful runs could have come from the sane population. 

The results of the tests indicated the probability to be less 
than one chance in ''.00,000. 

Another test was made to check the cor bora t ion of 
work category (program modification) similarity for successful 
runs with runs having failures. The results of a Kol.-mogorov- 
Smirnov test indicated a pLObability greater than 0.9999 that 
the work categories were from the same population. 
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TABLE 4-1 


K-S TESTS FOR NORMALITY OF VARIABLES 
(2136 SUCCESSFUL RUNS) 


VARIABLE 

MEAN 

STD 

DEV. 

MAXIMUM 

ABSOLUTE 

DIFFERENCE 

2-TAILED 

TEST 

P(Hq) 

NUMBER OF 
STATEMENT 
CHANGES 

5 

5 

- 0.34 

0.0000-^ 

CPU TIME 
(Sec) 

28.39 

46.41 

0.26 

0.000+ 
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TABLE 4-2 


K-S TESTS FOR NORMALITY OF VARIABLES 



(514 RUNS WITH 

DETECTED 

ERRORS) 


VARIABLE 

MEAN 

STD. 

DEV. 

MAXIMUM 

ABSOLUTE 

DIFFERENCE 

2-TAILED 

TEST 

P(Ho) 

NUMBER OF 
STATEMENT 
CHANGES 

15 

1.5 

- 0.29 

0.0000+ 

CPU TIME 
(Sec) 

33.29 

88.28 

0.35 

0.0000+ 

FAILURE 

SEVERITY 

2.9 

0.61 

0.52 

0.0000+ 
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Kolmogorov-Smirnov tests were performed on variables 
which were measurable at the appropriate level and for which a 
sufficient number of runs were recorded. The tests .#ere made 
on the distributions for the number of statement changes in 
successful runs compared to unsuccessful. The outcome is that 
which might be expected intuitively. Specifically, the 
probability that the number of changes was similar in both 
cases was less than 0,001 which is stronger than might be 
expected. In contrast, the work categories Cor successful runs 
or unsuccessful runs are indistinguishable. The probability of 
them being from the same population is 0.999. 
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TABLE 4-3 


NONPARAMETRIC CORRELATION OF VARIABLES 


VARIABLES COMPARED 
(ORDERED) 

CORRELATION 

COEFFICIENT 

SIGNIFICANCE 

LEVEL 

F ure severity with 

eti.or category 


0.978 

0.001 

Failure severity with 
error count 


0.917 

0.001 

Error category with 


0.903 

0.001 

CPU time with number 
of statement changes 


0.248 

0.001 

Work category (Program Mod.) 
with Program Activity 


0.128 

0.001 

CPU time with Program 
Activity 


-0.461 

0.001 

Program Activity with 
number of statement changes 


-0.3570 

0.001 

CPU time with error category 


-0.153 

0.001 

CPU time with error count 


-0.147 

0.001 

CPU time with failure severity 

-0.138 

0.001 

All other Parameter Comparisons 

0.11 
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4.3 


Non-Par araetr ic Results 


The relationships between variables were examined 
using distribution-free ( non-par ame t r ic) methods. The methods 
included Spearman's non-parameti ^ correlation analysis, 
Kolmogorov-Smirnov tests for similarity (independence) and 
Chi-square tests for comparability. 

4.3.1 Non-par ametric Correlation 

Table 4-3 presents the results of the non-par ame t r ic 
correlation analyses performed on variables that were measured 
at the appropriate levels. It may be observed that the highest 
positive correlation (on a scale from -1 to +1) is (0.978) 
between the failure severity with the error category . This 
high value for correlation should be interpreted as being a 
measure of the concentration of failures for local job failure 
only. In contrast, the error category distribution is quite 
broad and multimodal. The second highest correlation is also 
attributable to the concentration of failure severity into one 
category. A similar effect was observed for correlation of 
error count (number of errors) with any other variable. As it 
should be, the correlation of error count with the category was 
high (0.903). 

The next group of correlation coefficients are not as 
impressive but perhaps provide more insight into relationships 
that are not as intuitively obvious. The CPU time was 
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correlated with a number of variables. The CPU time is 
distributed over 177 oateg.-ries with a general distribution of 
the highest percentages in the first 12 categories; for the 
next 12 categories the CPU time dropped to approximately 
one-third the average for the first 12 and continued as a long 
tailing-off for the remaining categories. The general form is 
that of a negative exponential, which of itself is not 
significant. However, in terms of potential inference rather 
than form, the characteristic is similar to a Chi-square 
distribution with three degrees-of-€reedom. This may or may 
not be due to chance, but if it is significant, future studies 
might be directed toward the decomposition of the CPU time 
dependency upon a small number (4) of factors. It should also 
be cautioned that apparent variables may not be independent 
but interactive instead. Tn any event, the data as recorded 
does not permit factor analysis, and therefore, the 
non-pararaetr ic correlation of CPU time with other variables was 
computed as given, in Table 4-2. 

The variable found to have most significant positive 
correlation with CPU time was the number of statement changes 
(0.248). The coefficient is not high in absolute value but it 
is relatively high compared to other variables. The two 
relatively high negative correlations are due to the ar trary 
ordering of the program activity measured variabi,? which is 
comprised of combinations of compile/run activities. The 
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correct interpretation of the results should be that there is 
relatively high correlation of program activity with CPU time 
and the number of statement changes, respectively. The other 
correlations are of lesser magnitude; the proper interpretation 
is as given by the sign in the table. 


52 



5. Reliability Forecasts 


Any sequence of tests or experiments must eventually 
be concluded. The key question is when to stop. There are a 
number of answers to the question that are premised upon given 
criteria or values. In either case, the future reliability 
must be addressed. For example, the criteria could be the 
maximum deviation of a sample from a deterministic estimate, or 
the maximum mean-square-er ror between samples at a given 
confidence level. Another answer could be to stop when a 
measure of failure converges, or when the forecast converges to 
some value or has a well defined trend that passes through 
zero. The forecasts for the failure rates and failure ratios 
were computed for the composite (ensemble average) of the five 
modules and the individual modules. 

The method for forecasting is based on work first 

7 

published by G.O. Yule and refined by Box and Jenkins. It 
is a stochastic method that dees not depend upon the 
assumptions required for a deterministic and stationary model. 
The autoregressive integrated moving average (ARIMA) method is 
oomewhat a misnomer in that the "integration" evolved from a 
hardware application concept which makes use of a nonstationary 
summation filter. 

The data plots (as stratified by month) and forecasts 
for nine months beyond the 16 month test period are given in 
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Figures 5-1 and 5-2 for composite failure rate and composite 
failure ratio respectively. The 95% confidence bands for 
forecasts are indicated. Where the lower band goes below zero 
it is omitted. It may be seen from Figure 5-1 that the 
forecast for the composite failure rate converges to 
approximately 0.02. Figure 5-2 reveals that the forecast for 
the composite ratio trends toward zero after remaining at 
approximately 0.025 for three months. 

Figures 5-3 and 5-4 present additional normalized 
failure ratio forecast examples. In Figure 5-3 the forecast of 
BD? failure ratio as normalized by the number of statements 
predicts that the trend would approach zero asymptotically in 
approximately six months following the end of the recorded 
tests. The trend declined from the initial stochastic estimate 
of approximately 0.02 per 10' st^.ements. Th*’ oturn toward 
the end of the test period is attributable to the number of 
changes to the program module. Figure 5-4 reveals a similar 
forecast trend for the 3DT module except the zero asymptote is 
predicted for eighc months after the end of the recorded data 
interval. The absence of an upturn is apparently due to fewer 
progra.i cna'^.ges. The LOG failure ratio as presented in Figure 
6-5 ir .'elctively Mat (on the average) for the first 10 months 
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and begins a downward trend in January of 1977 toward zero in 
June of 1977. The forecast predicts that the normalized 
failure ratio of the LOG Module should have converged toward 
zero by the beginning of 1978, providing the type of 
perturbations introduced after the end of the data acquisition 
period were not significantly different from the perturbations 
encountered during the 15 months in which data were collected. 
The LDI and LSD failure ratios are given in Figures 5-6 and 5-7 
respectively and are quite similar to LDG as previously 
discussed. 

The failure rate characteristics with forecasts are 
given in Figures 5-8 through 5-12. The BDP module exhibits 
failure rate characteristics in Figure 5-8 that are quite 
similar to the BDP failure ratio. However, for the BDT module 
the correspondence between the failure rate, as presented in 
Figure 5-9, and the failure ratio is not as good. The best 
forecast estimate based on all past BDT module measurements 
produces divergence from zero. However, this may be 
influenced by the wide divergen-'e of failure rate at the 
beginning of the test period. If only the last 10 months of 
the test data were used the forecast would likely converge. 

This module obviously had rather severe problems initially. 

The best estimate for the failure rate of the LDG module is 
almost linear and the forecast indicates earlier convergence of 
the failure rate approaching zero sooner than the rate for BD'*’ 
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module and closer to the same time as the BDP module. The LDI 
and LSO failure rate characteristics, as exhibited in Figures 
5-11 and 5-12, are not significantly different, as a function 
of time, than the failure ratio forecasts. 

The essence of this analysis is that the ensemble 
average of both the failure rate and failure ratio are 
stationary and provide a basis for forecasting the program 
reliability. Individual module forecasts may not be as well 
behaved. Future study should provide an opportunity to test 
verify these methods for forecasting. 


56 




57 


Figure 5-1. Composite Failure Rate 
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Figure 5-3. BDP Failure Ratio — Normalized by Number Statements 
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Figure 5-4, BDT Failure Ratio - Normalized by Number of Statements 
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Figure 5-7. LSO Failure Ratio - Normalized by Number of Statements 
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Figure 5-9. BDT Failure Rate - Normalized by Number of Statements 
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Figure 5-10. LDG Failure Rate - Normalized by Number of Statements 
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Figure 5-11. LDI Failure Rate — Not t.ialized by Number of Statements 
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Figure 5-12. LSO Failure Rate — Normalized by Number of Statements 


6. SIGNIFICANT FINDINGS 


Specific findings from this study and potential 
applications include the following: 

1. Meaningful measurement of software reliability 
during development is feasible. These measurements should be 
useful to line management as a systematic method for assessing 
the progress of software reliability and identifying and 
comparing sources. 

2. Data acquisition for measurement of software 
reliability requires a deliberately distinct effort. The data 
normally recorded for systems records are not adequate for 
software reliability measurements. All personnel involved 
should be fully aware of this limitation. 

3. Most of the failures during development were not 
due to coding errors but, rather, were caused by associated 
data processing procedures. Such an outcome suggests that 
management might be able to enhance program reliability during 
development by establishing standards for data handling and 
program operation in general. Time, effort, and costs should 
be reduced if appropriate procedures are implemented and 
conscientiously followed. 
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4. The failure processes are not accurately described 
by deterministic methods; stochastic processes are apparent. 
Therefore, simplistic generalized models should be closely 
scrutinized before being employed. A generalized method may be 
adapted to modelling of a specific case or set of data. 

However, the converse is noc legitimate. Specifically, 
changing coefficients and exponents (of a deterministic model) 
that are derived from a single set of data does not produce a 
"generalized" model of anything. 

5. Scheduling or other management actions appear to 
have a significant affect on the rate of occurrence of failure 
during development. Such interactions are apparent 
contributors to widely varying excursions in failure events. 
Line management, project management and functional (software 
development) management should be alert individually to the 
potential for such induced problems. 

6. The natural outcome of some of the measurements 
produced data that were stratified into a limited number of 
categories. The analysis of such data must be restricted to 
theoretically sound and verified methods. Non-pararaetr ic 
(distribution free) methods should be used where appropriate 
and inverse transformations of results (as well as 
transformations of data) cannot be validated. Pretest of data 
acquisition procedures and instrumentc is strongly recommended. 
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7. Stochastic methods may be used at the end of a 
given time interval for estimating future reliability. This 
capability leads to criteria for definition of when to stop 
development testing. Examples are a forecast trend that is 
asymptotic to an acceptable level of error? or is stationary 
about zero. This should provide both management and 
researchers with a basic tool for comparison and assessment of 
programs for meeting future reliability goals, comparative 
reliability and comparison of the benefits of continued testing 
against incurred costs of time and effort. 
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7. CONCLUSIONS AND RECOMMENDATIONS 

Data collected during the development of a software 
system needed for ground based launch support at the Air Force 
Space and Missile Test Center, Vandenberg Air Force Base, 
California, and from the operational Viking ground data 
processing system at the Jet Propulsion Laboratory, Pasadena, 
California was analyzed to determine if any valid measures of 
software reliability could be made that might have utility when 
applied to operational avionics systems to p::edict their 
reliability. 

The failure rate (number of failures divided by CPU 
seconds for the calendar interval) and the failure ratio 
(number of failures divided by the total number of runs for the 
calendar interval) emerged as valid measures. They were 
subjected to linear, and to nonlinear orthogonal polynomial, 
regression analyses which confirmed their validity as 
indicators of system stability. 

The composite failure rate and ratio data were also 
used to forecast the reliability of the system for nine months 
following the seventeen month test period for which data 
existed. The forecast predicted that the failure rate would 
converge to 0.002 and the ratio r uld converge to near zero 
after an initial three months at 0.025. This forecast could 
not be validated against real-world experience since the data 
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collection process had ceased after the seventeen month 
period. This lack of corroborating data emphasizes v he 
criticality of defining the scope of the data collection 
process at the outset to insure the availabiiiy of necessary 
data. 

The raw data plots of failure cate and ratio exhibited 
both high and low points. Project staff at SAMTEC was queried 
as to any events that might have caused these and it was 
learned that the high points were all directly related to the 
start of intensive periods of testing and the lows to relative 
inactivity due to program review preparation. The concerned 
project manager should note from this that other than pure 
software problems can impact apparent progress. 

The techniques of measurement discussed in this report 
appear promising as indicators of reliability. It is 
recommended that they be applied to operational avionics 
systems with a recorded history of failures to accomplish the 
further step of establishing an effective measure o^ software 
reliability analogous to hardware mean time to failure. 

Careful attention to data collection should be paid to insure 
the quality and continuity of the data base, including 
separation of actual software changes. The establishment and 
analysis of this data base would be a major contribution 
towards the goal of system certif lability. 
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R of block data statements IS...0 
~R CF <DO UNTIL> STATEMENTS IS...Q 
... OF NAMELIST STATEMENTS IS...0 

:er of rewind statements is-..o 
;UM 5ER OF READ STATEMENTS IS...0 
."JM3ER OF <CO L/3EL> STATEMENTS IS...0 
wMCLR OF EXIT STATEMENTS IS... 331 
VJMCER of label STATEMENTS IS...0 
;uf13ER OF <00 KHILE> STATEMENTS IS...0 


FUNCTION FEREIT **»*»«# 


00000010 


n!UMBER of comment CAROS IS ...957 
>1UMBER OF TYPE CAPOS IS... 57 
<*Ur!CER OF CCMMON CARDS IS.. .'9 
>Iu:l3CR of equivalence CAROS IS... 95 
SUMCER CF DIMENSION CARDS IS ...9 
rjMEFR OF DATA CA^^US IS ...5 
NUMCER OF CALL STATEMENTS IS ...76 
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OF IF STATEttEHTS IS .,,109 
SWBER OF EXECUTE STATEMENTS IS... 60 
MU^:bER OF STOP STATEMENTS IS ...5 
or FLSE STftTEIIENTS IS. >.11 
s*l:.';3ER OF end statements is... 196 
MUrCER OF assignment statements 1S...0 
SUMBER OF <00 FOR> STATEMENTS IS...S^ 
s of <i^roo> statements is... 14 
surcER of procedure statements is... 31 
N’Ur.BER OF <CrCLE> statements IS...0 
MJM3ER OF UNRECOGNIZED STATEMENTS IS...0 
CP <CQ CA5E> STATEMENTS IS...S 
;LT3ER of <C^SE> STATEMENTS IS... Cl 
'DUMBER OF WRITE STATEMENTS IS ...Cl 
'DUMBER OF FORMAT STATEMENTS IS... 21 
<;"OER OF <SUS^QUTINE> STaMEMENTS IS. . .4 

N:*jr;SER of <retu?n> statements is...s 

sWBER OF BLOCK DATA STATEMENTS 1S...0 

s”jr:BER or <oo until> statements IS ...0 
n”J:OER of namelist STATEMENTS IS...0 
CUMBER OF REWIND STATEMENTS IS...0 
nL’MCER of read statements IS...0 
'^L’MSER of <00 LA5EL> STUEMENTS IS...0 

.VM>=^ER OF EXIT STATEMENTS IS. . .460 

4UM2ER OF LABEL STATEMENTS IS...0 
'iL-MBER OF <00 WHILE> STATEMENTS IS...0 


FUNCTION GCDOIT 


00001200 


>IUM8ER OF COMMENT CAROS IS ...652 
nTUMSER of type CARDS IS...<iO 
>iUM5ER OF COMMON CARDS IS... 32 
«*UMBER OF EQUIVALENCE CARDS IS... 13 
><UMSEK OF DIMENSION CAROS IS . . 

4UM3ER OF DATA CARDS IS ...15 
4UM3ER OF CALL STATEMENTS IS ...58 
s’UMBER OF IF STATEMENTS IS ...1^3 
rjMBER OF EXECUTE STATEMENTS IS... 87 
>IUMBER OF STOP STATEMENTS IS ...8 
MUMBER OP ELSE STATEMENTS IS... 71 
rjMDER OF END STATEMENTS IS... 205 
>IL'::EER of ASSIG^.^*.£NT STATEMENTS IS...0 
«i3ER OF <00 FOR> STATEMENTS IS... 39 
*rjMBER OF <UNOO> STATEMENTS IS... 22 
>IUMDER OF PRCCEOUPc STATEMENTS IS... 39 
S'JMStR OF ^CTCLE> STATEMcNTS IS... I 
>IUMBER OF UNRECOGNIZED STATEMENTS IS...0 

'<lt;b:r of <oo case> ‘^’^‘'^ements is ... 2 

'^I'MSER OF <CASE> ST.'. •' .TS IS... 11 
4b:;3hR CF WRITE STA NTS IS ...119 
nIUMDER of format STA.tMEMTS IS... 119 
MLMSER OF <SUEROUTINE> STATEMENTS IS... 9 
>Hr!?ER CF <RETURN> *“*'ATEME.NTS IS... 5 
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‘4UMSER OF comment CAROS IS , . . 3679 

OF TYPE r>^RrS IS...CC2 

C? CO:rCN CARDS IS... 166 
SUM6IER OF EQ’JIVALENCE CAROS IS... 30 
S*'JM5ER OF DIMENSION CARDS IS ...9 

data CAPPS IS ...co 

CF CALL STATEMENTS IS ...ISC 
S’UMSER CF IF STATEMENTS IS ,..3C^* 

NiJMSER OF EXECUTE STATEMENTS IS... 182 

OF STOP statements IS ...6 

-iJME-ER OF ELSE STATEMENTS IS... 102 
,'J‘:rER OF END STATEMENTS IS...33S 
\”MSER OF ASSIGNMENT STATEMENTS IS...0 
-r.TtSER PF <rO FCP^ STATEMENTS IS... 62 
\jr:-ER OF vun:o> statements is... 35 
\’UM3ER of procedure STATEMENTS IS... 6^ 
S’JMCER OF <C'tCLE> STATEMENTS IS... 2 
OM^ER UN*^ECCGNX2E0 STATEMENTS IS...0 
nJME'ER of <00 CASE> STATEMENTS IS... 7 
‘SUMTER 0? <CASE> STATEMENTS IS... 61 
s'-JMEER OF WRITE STATEMENTS IS ...218 
rjMSER OF FORMAT STATEMENTS IS... 1^3 

\'j**3ER OF <sl:routine> statements is... 9 

•4UM3ER OF ^RETURN> STATEMENTS IS... 10 
^UMDER CF 5LCCK DATA STATEMENTS IS...1 
■'.r*TER or <D0 UNTIL^ STATEMENTS IS...0 
‘>L":3ER OF NAMELIST STATEMENTS IS...0 
number CF REWIND STATEMENTS IS...0 
nIUMB.R of read STATEMENTS IS...0 
\*UM3ER QP <00 LABEL> STATEMENTS IS...0 
NUMDCR OF EXIT STATEMENTS IS... 1009 
;UMOER OF LABEL STATEMENTS IS...0 
SUM5ER OF <00 WHILE> STATEMENTS IS...0 


END-0F-J03 
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MLWER OF BLOCK DATA STATEHENTS IS...1 
MUWER OF <DO UNTIL> STATEMENTS IS...0 
>rJ^^3ER OF NAMELIST STATEMENTS IS...0 

s TA iii rirM TS 

iUMBER OF READ STATEMENTS 1S,..0 
number of <D0 LABEL> STATEMENTS IS...0 
;UH5ER OF EXIT STATEMENTS IS.., 563 
nTHBEF? of label STATEMENTS IS...0 
Jjr.3ER OF <DO WHiLE> STATEMENTS 1S...0 


FUNCTION STUFIT »»»*» 


00000010 


NtfMBER OF COMMENT CAPPS IS .,,A5E 

^'J.'i3ER OF TYPE CARDS IS..,E6 
>iUMBER OF COMMON CAROS IS... 34 
>11;MBER of EC3UIVALENCE CARDS IS... 6 

-jVMSHR OF DIMENSION CARPS IS ...2 

n’UMDER of data CAROS IS ...17 
SUM3ER OF CALL STATEMENTS IS ...26 
VL715ER OF IF statements IS ...143 
n»JMB£R of execute STATEMENTS IS... 102 
'4UM3ER OF STOP STATEMENTS IS ...6 
SUMBER OF ELSE STATEMENTS IS... 53 
'nJMBER OF ENO STATEMENTS IS... 197 
NVMrER OF ASSIGNMENT STATEMENTS IS...0 

su::ber of <do for> state.ments is... 35 

^iUMBER OF <UWO> STATEMENTS IS... 26 
^itJMBER OF PROCEDURE STATEMENTS IS... 30 
CUMBER OF <CYCIE> STATEMENTS IS...0 
DUMBER OF U.NRECCCNI2E0 STATEMENTS IS...0 
^MBER OF <00 CASE> STATEMENTS IS... 10 
'CUMBER OF <CASE> STATEMENTS IS... 59 
number of write STATEMENTS IS ...36 
'CUMBER OF FORMAT STATEMENTS 1‘...35 
^<Ur13ER OF <SU3R0UTINE> STATEMENTS IS... 3 
DUMBER OF <RETURN> STATEMENTS IS... 5 
s”J»V5ER OF BLOCK DATA STATEMENTS IS...0 
SUM3ER OF <00 UMTIL> STATEMENTS IS... 4 
'CUMBER OF NAMELIST STATEMENTS IS...1 
NUMBER OF REWIND STATEMENTS IS... 2 

NUMBER OF READ STATEMENTS IS... 3 

NUMBER OF <00 LABEL> ETATEMENTS IS... 2 
NUMBER OF EXIT STATEMENTS IS... 340 
NUMBER OF LABEL STATEMENTS IS... 2 
NUMBER OF <D0WHUE> STATEMENTS 1S...0 


««««« FUNCTION WRITON 


00000015 
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APPENDIX C 


DATA ACQUISITION FORMS , 

COMPUTER PROGRAM RUN ANALYSIS REPORT 
INSTRUCTIONS 


To he filled out hy prograrTTring lihrarian or responsible progronmer 
after each computer run. If the run was unsucessful (SINTAX errors^ 
dbort, calculation error ^ loop, etc. ), the supplemental foimi 
COriPUTEB yUDCRAM FAILURE AJiALISlS REPOFIT should also he complete. 

Tnis fo2'm will yield error statistic data and computer run time data. 

1 . 

2 . 

3 . 

4 . 

5 . 

6 . 

7 . 

8 . 

PAGE BLANK NOl 


Use program mnemonic. 

This time is start time of conpuler execution from the conputer 
printout. 

If answer is no, comolete CO.‘*IPUTER PROGRAM FAIL URE .ANALYSIS REPOR T. 
This can be gotten from. the conputer printout. 

Check the appropriate box. 

Check the appropriate box. 

Check the appropriate box. 

Check the appropriate box. 
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DATE 

COMPUTER PROGRAM RUN ANALYSIS REPORT. 

1. Consul er Program Coiiipone.'il ID 

2. Run Dale: Day Nbn Yr Hr ?-Un 

3. Successful Run? 

4. CPU Time: Nun Sec 

5. Categor)' of Work: 


a. 

Program Development 


CD 

b. 

Program Modification: 




(1) IiT^ilementation of Additional Requirement 

CD 


(2) Implementation of Hardware Change 

CD 


C3) Memory/Time Optimisation Erthancement 

CD 


(4) Error Correction 


CD 


(5) Design ^V^dification 


CD 

c. 

Program Conversion . 


I — \ 

LJ 

d. 

Other 


CD 

CPC I /CPC Status 



a. 

CPC Test and Eval c. 

Full Integ. Test 

CD 

b. 

Partial Inleg. Test Cj d. 

Production Program 

CD 

- 

e. Other 


J — 1 
LJ 

Program 

Actiiat)' 



a. 

Coirpilation CD c. 

Run with no compile 

CD 

b. 

Compile and run CD d.. 

Other 

CD 


S. Number of Source Siatcmenls Changed/EVeleted Inserted 


a. 

None 

CD 

e. 

51-40 

CD 

i. 

101-150 

CD 

b. 

1-10 

CD 

f. 

41-50 

CD 

j- 

151-200 

CD 

C- 

11-20 

CD 

g- 

51-75 

CD 

k. 

Over 200 

CD 

d. 

21-50 

CD 

h. 

76-100 

CD 










Contact 
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CO.TOE.R PR03FA^, FAILURE AIJAL'^SIS REP0R1 
INSTRUCTIONS 


JO he filled out by the rcepcmaible developer for cooh urtaueccstful jrun. The 
failure inf oration thould he cn^aila^Lc on the proortr^ print <r it or fro»r. the 
eorputcr opcT^tor, The error dsta can he derived //vm an anolysie of t?i# 
progr^ output. (It is poesibtc that a failure ccji he ccustd by nore th-an enr 
error, list them all). 

•• 

1- Use pro^mm menonic. 


2. This Time is start time of coqputer execution rom t-he coq^ier printout. 

3. Check box vhich most nearly describes the failure indicatibn. If other is 
checked, briefly describe failure. 

4. The count u;ider the error cate£ 0 J 7 r-eajis nuroer of errors not nurher of eiToncous 
statcr^nts. 


k. Exar^jles of CO-IPUT AT I CN.M, ERRORS include: (1) Incorrect operand in 

equation, (2J Incorrect use of parenthesis, (3) Sign convention error 

(4) Units or data conversion error, (S) Cor^jtation produces an over/under 
How, (6) Incorrect equation itsed» (7) Precision lest cue to iri^ed node, 

(8) Missing corputations, (9) Rr^jciding or truncation error and loop. 

B. txarples of LOGIC ERRORS incU»de: (1) Incorrect operand in logical ex^Tcs- 

sion (2) logic activities out of sequence, (3) V.rong \ariablc being 
checkt-d, (4) ?*Jssing logic or condition tests, (S) Too rar)'/:oo fr-» state- 
rents in loop, [6) Loop iterated incorrect nurber of titles (inducing 
endless loop). 

C. Ijcsjples of UMA TKrVT ERRORS include: (1) Invalid input read frorti conect 

data file, (7^~ Input rcau iron incorrect data file, (3) Incorrect Lnput 
forrat, (4) Incorrect forrat statement referenced, (S) EOf encountered 
prcrarurely, (6) EOF missing. 

D. Exarples of HAVDL1N*G ERRORS include: ‘(1) file not re-ound before 

reading, (2) Tbta ^nitiaiiiation not done, (3) Data initi-liiation done 
irpropcrly, (4) Variable used as a flag or index not set properly, 

(5) Variable referred to by wrong name, (6) l‘ariable r>*pe is incorrect, 

(7) Daua paclJ ng^''Lo:pacV±ng error, (S) Sort error, (5) Sub'^cripting error. 

E. Exaroles of DATA OUTPUT ERRORS include: (1} Data vritten on vrong file, 

(2) Data vritten using vTcng format statc^ient, (3) Data written in the 
vreng format, (4) Data written with wrong carriage control. (5) Incc'pletc 
or r.issing output, (6) Output field sice to srall, (7) Line count and 
page eject problems. 

F. Exarplcs of 3 NTEJtFACZ ERRORS include: (1) h’rong subroutine called, 

(2) Call to suOTOutme rioe in wrong place, (3) Subroutine argurcrits not 
consistent in tm-pe, units, order, etc. (4) Subroutine called is nonrxis;ent. 

G. Examples of ARRAY PROC£SSTN*G ERRORS include: (1) Array not properly 

dimensioned, (2) Array rci erenced out of bounds, (5) Array beinf iciercnceti 
at incorrect loca.tion, (4) Array pointers not increruented picpcjly. 

H. Examples of DATA EASE ERRORS include; (1) Data should have been initial- 
iied in data~basc out vasiTT, (2) Data initisliied to incorrect value in 
data base, (3) Data base uruts arc incorrect. 

I. Examples of 0-ERAT30S' ERRRG5 include; (1) Operating system, error, 

(2) Haid^arc error, (5) Operator error, (4) Test execution error. 

J. Examples of PROTRAM EXHailCN’ ERRORS include: (1) Tir>c litix exceeded, 

(2) Gore storage Ilmjt excecoeS"^ TTT Output line litit exceeded, 

(4) Corpilaticn error. 

r. Exs.'ples of ri:^eN7AT10T ERRORS incl-v>: (1) User ror.ual error. (2) Inier- 

face error, (3) Ic‘ *. pec error, (4) Requ’rcr.ents spec error. 

L. STiefly describe the c 
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SVST3I date 

COiiPUlER PROGRAM FAILURE ANALYSIS REPORT 


1. 

Consul er Program Coirponent ID 


2. 

Run Date: Bay Yr 

Hr 

3. 

Severity of Faiiuj'e 



A. Caused Complete System to Crash 

C3 


B. Caused A Dependent Job to Fail 

a 


C. Local Job Failure Only 

CD 


D. Real Time Failure 

CD 


E. Other ' , 

CD 

4. 

Error Category 

Count 


A. Conputational Error 

CD 


B. Logic Error 

CD 


C. Data Input Error 

CD 


D. Data Handling Error 

CD 


E. Data Output Error 

CD 


F. Interface Error 

CD 


G. .Rxray Processing Error 

n 


H. Data Ease Error 

CD 


I. Operation Error 

CD 


J- Program L^ecution Error 

CD 


K. Docur.entation Error 

CD 


L. Other 

CD 


Contact 
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real-time criticality index is a measure of the penalty 
incurred for a late completion of the system mission. Thus an 
air-traffic control system would have a high RTC compared to 
other systems- The inclusion factor (defined later) is a 
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Abstract: It is probably obvious that B, the number 

of errors that a programmer might make in implementing any 
given algorithm in any given programming language, depends upon 
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recently, it has also been equally obvious that there was 
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called Algorithm Dynamics or Software Physics (6, 8, 9, 10, 
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program. A few experiments on Programmer Productivity (5, 7, 
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accounts for the combined effects of program volume and program 
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possibilities for making an erroneous discrimination. In the 
following sections we will reproduce the hypothesis and apply 
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(probabilistic and deterministic) and used to assess 
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practical role of testing in software development. We prove a 
fundamental theorem showing that properly structured tests are 
capable of demonstrating the absence of errors in a program. 

The theorem's proof hinges on our definition of test 
reliability and validity, but its practical utility hinges on 
being able to show when a test is actually reliable. We 
explain what makes tests unreliable (for example, we show by 
example why testing all program statements, predicates, or 
paths is not usually sufficient to insure test reliability), 
and we outline a possible approach to developing reliable 
tests. We also show how the analysis required to define 
reliable tests can help in checking a program's design and 
specifications as well as in preventing and detecting 
implementation errors. 

Green, T. F., and Schneidewind, Howard, G. T., and Pariseau, R. 
J. "Program Structure Complexity and Error Characteristics," 
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The ability to detech and correct errors in a computer 
program is governed to a great extent by the structure of the 
program. Structure is important in two ways: (1) errors are 

more difficult to find in complex structures; and (2) more 
errors are generated initially during programming with complex 
structures. A method of characterizing structure is to 
represent the program logic in the form of a directed graph, 
where nodes and area represent decision instructions and 
straight line coding, respectively. This representation can be 
analyzed in terms of the following measures; probability of 
reaching an arc with an input; test coverage achieved with N 
inputs; numbers of nodes and arcs; and ration of actual to 
maximum number of arcs. Since program structure is most 
meaningful when related to the distribution of possible errors 
in the program, the ability to detect errors in various 
structures is studied. This is accomplished by employing an 
error detection simulation model. The relationships which are 
analyzed are error detection and test coverage as a function of 
program structure and number of inputs. These functions would 
be used in the design of software to avoid structure which are 
difficult to test and during testing for allocating resources 
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to tests in accordance with structure and error detection 
characteristics. 

As expected, it was found that the ability to detect 
errors decreases with increasing complexity. This was caused 
by program coverage decreasing with increasing complexity. An 
interesting aspect of the results is the asymptotic nature of 
the functions, which demonstrates the difficulty of finding 
additional errors after a critical value of coverage has been 
achieved, where the critical value of coverage is relatively 
low in complex structures. 

Haines, Andrew L. : "Some Contributions to the Theory of 

Restricted Classes of Distributions with Applications to 
Reliability", M73-35, The MITRE Corporation, Washington 
Operations, May 1973. 

Haney, F. M. : "Module Connection Analysis - A Tool for 

Scheduling Software Debugging Activities", APIPS Conference 
Proceedings, Volume 41, Part T, 1972, 173-179. 

Hecht, H. , Measurement, Estimation, and Prediction of Software 
Reliability , NASA CR-145135, National Aeronautics and Space 
Administration, Washington, DC (January 1977) . Also in 
Software Engineering Techniques, Infotech International Ltd., 
Maindenhead, Berkshire, England, (1977) , Vol. 2, p. 209-244. 

IEEE; Record, 1973 IEEE Symposium on Computer Software 
Reliability, New York City, April 30 - May 2 1973, No. 73 CHO 
0741-9 CSR. 

Itoh, D. , and Izutani, T. , "FADEBOG-I, a New Tool for Program 
Debugging," Record 1973 IEEE Symposium on Computer Software 
Reliability , IEEE Catalog No. 13 CHO‘3^41-T"cSR. 

Jaynes, Edwin T.; "Prior Probabilities", IEEE Trans, or 
Systems Science and Cybernetics, Vol. SSC-4, No. 3, September, 
1968, 227-241. 

Abstract: In decision theory, mathematical analysis 

shows that once the sampling distribution, loss function, and 
sample are specifiid, the only remaining basis for a choice 
among different admissible decisions lies in the prior 
probabilities. Therefore, the logical foundations of decision 
theory cannot be put in fully satisfactory form until the old 
problem of arbitrariness (sometimes called "subjectiveness") in 
assigning prior probabilities is resolved. 

The principle of maximum entropy represents one step 
in this direction. Its use is illustrated, and a 
correspondence property between maximum-entropy probabilities 
and frequencies is demonstrated. The consistency of this 
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principle with the principles of conventional "direct 
probability" analysis is illustrated by showing that many known 
results may be derived by either method. However, an ambiguity 
remains in setting up a prior on a continuous parameter space 
because the results lack invariance under a change of 
parameters; thus a further principle is needed. 

It is shown that in many problems, including some of 
the most important in practice, this ambiguity can be removed 
by applying methods of group theor cetical reasoning which have 
long been used in theoretetical physics. By finding the group 
of transformations on the parameter space which convert the 
problem into an equivalent one, a basic desideratum of 
consistency can be stated in the form of functional equations 
which impose conditions on, and in some cases fully determine, 
an "invariant measure" on the parameter space. The method is 
illustrated for the case of location and scale parameters, rate 
constants, and in Bernoulli trials with unknown probability of 
success. 

In realistic problems, both the transformation group 
analysis and the principle of maximum entropy are needed to 
determine the prior. The distributions thus found are uniquely 
determined by the prior information, independently of the 
choice of parameters. In a certain class of problems, 
therefore, the prior distributions may now be claimed to be 
fully as "objective" as the sampling distributions. 

Jelinski, Z.; Moranda, P.: "Applications of a 

Probability-Based Model To a Code Experiment", Record, IEEE 
Symposium on Computer Software Reliability, 1973, 78-80. 

Jelinski, Z.: Moranda, P. : "Software Reliability Research", 

Statistical Computer Performance Evaluation, Freiberger (Ed.), 
Academic Press, New York, 1972. 

Abstract: Software reliability study was initiated by 

Advanced Information Systems subdivision of McDonnell Douglas 
Astronautics Company, Huntington Beach, California, to conduct 
research into the nature of tne software reliability problem 
including definitions, contributing factors and means for 
control. 

Discrepancy reports which originated during the 
development of two large-scale real-time systems form two 
separate primary data sources for the reliability study. A 
mathematical model, descriptively entitled the 
De-Eutrophication Process, was dc;.eloped to describe the time 
pattern of the occurrence of disc repancies (errors) . This 
model has been employed to estimate the initial (or residual) 
error content in a software package as well as to estimate the 
time between discrepancies at any phase of its development. 
Means ot predicting mission success on the basis of errors 
which occur during testing are described. 
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Problems in categorizing software anomalies are 
described and the special area of the genesis of discrepancies 
during the integration of modules is discussed. Management 
techniques which should reduce the number of software anomalies 
are described. 

Jelinski, Z.; Moranda, P.: "Applications of a 

Probability-Based Model to a Code Reading Experiment", Record, 
IEEE Symposium on Computer Software Reliability, 1973. 

Johnson, J. P., Software Reliability Measurement Study , 
SAMSO-TR-75-279 , Aerospace Corporation, El Segunda, CA (8 
December 1975) . 

Abstract! The report contains plans for a complete 
software reliability measurement program using both manual and 
automatic data entry. The program is to be run in conjunction 
with SAMTEC at Vandenberg AFB in an effort to establish 
measurement and evaluation criteria for the advanced systematic 
techniques for reliable operational software (ASTROS)) 
project. An integral part of that project is the 
implementation and evaluation of structured programming 
techniques . 

Included in the report are all forms necessary to 
describe the software development environment, the hierarchy 
and size of programming modules, and to capture any significant 
events that will affect programming and test while they are in 
progress. Forms and instructions for their use for manual data 
collections are included, as are descriptions of items that 
could be collected automatically. 

Keezer E. I., "Practical Experiences in Establishing Software 
Quality Assurance," Proc. 1973 IEEE Symp. Computer Software 
Reliability , Brooklyn Poly. Inst., April 1973, pp. 132-135. 

King, J. C., A Program Verifier , Ph.D. Thesis, Carnegie-Mellon 
University, Pittsburgh, 1969. 

Abstract: This research is a first step toward 

developing a "verifying compiler." Such a compiler, as well as 
doing the standard translation of a program to a machine 
executable form, attempts to prove that the program is 
correct. In order to do this a program must be annotated with 
propositions in a mathematical notation which define the 
"correct: relations among the program variables. The verifying 
compiler then checks for consistency between the program and 
its propositions. 

The thesis presents the theoretical basis of the 
method and then describes a prototype verifier in detail. This 
verifier, running on an IBM 300, operates on programs written 
in a simple programming language for integer arithmetic. Many 
programs have be< •> automatically verified by this program. 
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These include a simple sort programr a program which examines a 
number for the property 'prime,' and a rather subtle program 
which raises an integer to an integeral power. 

The formal analysis of a program produces 
"verification conditions: which must be proven to be theorems 
over integers. The verifier proves these theorems by using 
powerful formula simplification routines and specialized 
techniques for integer expressions. Ideas for improving this 
verifier and for building one which will operate on a more 
complicated programming language are presented. 

Knuth, Donald E.: "An Empirical Study of FORTRAN Programs", 

CSD Report CS-186, Stanford University, 1970. 

Abstract: A sample of programs, written in FORTRAN by 

a wide variety of people for a wide variety of applications, 
was chosen "at random" in an attempt to discover quantitatively 
"what programmers really do." Statistical results of this 
survey are presented here, together with some of their apparent 
implications for future work in compiler design. The principal 
conclusion which may be drawn is the importance of a program 
"profile," namely a table of frequency counts which record how 
often each statement is performed in a typical run; there are 
strong indications that profile-keeping should become a 
standard practice in all computer systems, for casual users as 
well as system programmers. This paper is the report of a 
three month study undertaken by the author and about a dozen 
students and representatives of the software industry during 
the summer 1970. It is hoped that a reader who studies this 
report will obtain a fairly clear conception of how FORTRAN is 
being used, and what compilers can do about it. 

LaPadula, Leonard J., "Engineering of Quality Software Systems, 
Vol VIII - Software Reliability Modeling and Measurement 
Techniques", MITRE Corp. , RADC-TR-74-325, Vol VIII, Final 
Technical Report (Jan 1 - Jun 30, 1973) , Jan 1975 (Under RADC 
contract F19628-C-73-0001, Software Reliability and 
Timeliness). (AD A007773). 

Abstract: This report presents an overview of the 

technological background common to the six tasks of project 
522A, a part of MITRE Project 5220, The Advanced Systems 
Technology Program, under the direction of the Rome Air 
Development Center, United States Air Force. Besides 
discussing general background, this volume provides an 
introduction to each of the other seven volumes of the entire 
report. 

LaPadula, L. J.; Clapp, J. A.: Engineering of Quality Software 

Systems", MTR-2648 Volume I, The MITRE Corporation, Bedford, 
Massachusetts, June 1973. 
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Lipow, M. Estimation of Software Package Residual Errors ^ 
TRW-SS-72-09, TRW Systems Group, Redondo Beach, CA (Nov 1972) . 

Lipow, M., "Maximum Likelihood Estimation of Parameters of a 
Software Time-To-Pailure Distribution", TRW Systems Group, TRW 
Report No. 2260. 1. 9-73D-15 (Rev 1), Jun 1973. 

Lipow, M. , "Some Variations of a Model for Software 
Time-To-Pailure" , TRW Systems Group, Correspondence 
ML-74-2260. 1.9-21, Aug 1974. 

Liskov, B. H.: "Guidelines for the Design and Implementation 

of Reliable Software Systems", MTR-2345, The MITRE Corporation, 
Bedford, Massachusetts, 14 April 1972. 

Abstract: This document describes experimental 

guidelines governing the production of reliable software 
systems. Both programming and management guidelines are 
proposed. The programming guidelines are intended to enable 
programmers to cope with a complex system effectively. The 
management guidelines describe an organization of personnel 
intended to enhance the effect of the programming guidelines. 

Littlewood, B. and Verrall, J. L. "A Bayesian Reliability 
Growth Model for Computer Software," Journal of the Royal 
Statistical Society , Series C, Applied Statistics, 1^7 3. 

Lloyd D. and Lipow, H. Reliability : Management Methods , and 

Mathematics , Prentice-Hall, Englewood Cliffs, New Jersey, 1964. 

London, R. L. "Certification of the Algorithm Treesort," Comm. 
ACM. Vol. 13, No. 6, 1970, pp. 371-373. 

Mac Williams, W., "Reliability of Large Real-Time Control 
Software Systems," Proc . 1973 " EEE Symp . Computer Software 
Reliability , Brooklyn Poly. Inst. April 1973, pp. 1-6 . 

Abstract: This paper is written from the point of 

view of the design of today's large and complex real-time 
computer-based control systems using multi-processor computers. 

The software reliability is not under control as a 
design tool in anything like the hardware sense. In fact, it 
is not to clear how to define software reliability in a precise 
way and to measure it. What can we learn about software 
reliability by examining hardware reliability theory? 

This paper may be viewed in terms of three levels of 
definition of software reliability; a) an overall or 
high-level definition, b) an intermediate-level definition 
(which might be termed a system designer's definition), and c) 
a low-level, measurement, or nitty-gritty definition. 
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Merritt, M. J. et al., Characteristics of Software 

Quali ty , "Report 25201-6001-RU-00 , TRW Systems, Redondo Beach, 

CA (December 1973). 

Miller, 1. and Freund, J., Probability and Statistics for 
Engineers , Prentice-Hall, Englewood Cliffs, New Jersey, 1965. 

MlL-STD-483, Configuration Management Practices ^ SVi. -ns, 
Equipment , Munitions and Computer Programs . 

MIL-STD-490, Specification Practices . 

Mills, H. D., "On the Statistical Validation of Computer 
Programs," IBM Report PSC-72-6015, July 1970. 

MIPS (Metric Integrated Processing System) Performance and 
Design Requirements, System Segment Specification , 
MIPS-1023-3lt7,C6, Data Processing Directorate, Federal 
Electric Corporation, Vandenberg Air Force Base, Ca, Contract 
No. F04701-72-C-0203 (29 November 1976). 

Moranda, P., "Probability-Based Models for the Failures During 
Burn-In Phase", Joint National Meeting ORSA/TIMS, Las Vegas, 

NV, Nov. 1975. 

Moranda, P. B. and Jelinski, Z., "Software Reliability 
Research", Conference on Statistical Methods for the Evaluation 
of Computet Systems Performance, Providence, R.I., Nov 1971. 

Moranda, P.B. and Jelinski, Z., "Final Report on Software 
Reliability Study". McDonnell Douglas Astronautics Company, MDC 
Report No. 63921, Dec 1972. 

Munck, R. G.; "Discussion of Session VII, Software 
Reliability", Statistical Computer Performance Evaluation, 
Freiberger (ed.) , Academic Press, New York, 1972, 513-514. 

Nelson, E. C. A Statistical Basis for Software Reliability 
Assessment, TRW-SS-73-03 , TRW Systems Group, Kedondo Beach, CA 
(1973) . 

Abstract: A mathematical definition of the 

reliability of a computer program is developed from the 
mathematical definitions of a program and program execution 
given in Blum’s mathematical theory of the semantics of 
programming languages. The reliability so defined is 
measurable and it is related to the structural properties of 
computer programs using concepts borrowed from the PACE system 
of automated software test tools. 

Ogdin, Jerry L.: "Improving Software Reliability," Datamation, 

(January, 1973), 49-52. 
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Pierce, William H.: Failure-Tolerant Computer Design, Academic 

Press, New York, 1965. 

Richardr, F. R., "Computer Software; Testing, Reliability, 
Models, and Quality Assurance", Naval Postgraduate School, 
Monterey, CA. July 1974. 

Rubey, R. J. "Quantitative Aspects of Software Validation," 
Proceedings , 1975 International Conference on Reliable 
Software, IEEE Catalog No. 75 CH0d4t)-7 CSR. 

Abstract; This paper discusses the need for 
quantitative descriptions of software errors and methods for 
gathering such data. The software development cycle is 
reviewed and the frequency of the errors that are detected 
during software development and independent valuation are 
compared. Data obtained from validation efforts are presented, 
indicating the number of errors in 10 categories and three 
severity levels; the inferences that can be drawn fr-rm this 
data are discussed. Data describing the effectiveness of 
validation tools and techniques as a function of tine are 
presented and discussed. The software validation cost is 
contrasted with the software development cost. The 
applications of better quantitative software error data are 
summarized. 

Rudner, Beulah "Design of a Seeding/Tagging Reliability Test," 
in C^ummary of Technica l Progress , Software Mod ling Studies , 
RADC-TR-7b-143, Rome Air Development Center (May 1976) . 

Schick, G. J.; Wolverton, R. W. : "Assessment of Software 

Reliability", MDAC Paper WD 1872, McDonnell Douglas 
Corporation, August 1972. 

Abstract: This paper discusses the problems in 

achieving reliability of large-scale software systems. 
Comparative studies of a contemporary U.S. Air Force software 
project, a NASA software project, and a commercial real-time 
software project are described. Software development and test 
management procedures which lead to software reliability ar<» 
analyzed. The underlying premise advnaced is that so*" a 
reliability must be designed into the system from chf ining 

using a systems approach. The paper describes the sy 
approach to software reliability which requires (1) 
understanding of the total software development and test life 
cycle, (2) identification of conventional and extended 
conventional test techniques for precision validation testing 
of applications programs, and (3) allocation of resources in a 
cost-and performance-effective manner, in advance, over the 
entire development period. The paper focuses on the testing 
approach, test planning and integration, deficiency reporting 
and control, and data collection and analysis. 
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Schneidewin-^, N. "A Methodology for Software Reliability 
Prediction and Quality Control," Naval Postgraduate School 
Technical Report NPS55SS72111A, November 1972. 

Abstract: The increase in importance of software in 

command control and other complex systems requires increased 
attention to the problems of software reliability and quality 
control. This paper reports on initial attempts to develop a 
methodology for Naval Tactical Data System software reliability 
and presents the results of several statistical analyses which 
were performed in order to obtain an appreciation for the 
statistical characteristics of software reliability data. An 
approach to analyzing software reliability problems is outlined 
and a methodologi “ror reliability prediction and quality 
control is presented. Characteristics of software reliability 
statistical distributions are reported. 

Schneidewind, N. P. "An Approach to Software Reliability 
Prediction and Quality Control," Fall Joint Computet 
Conference , 1972, pp. 837-847. 

Abstract; The increase in imporoance of software in 
command and control and other complex systems has not been 
accompanied by commensurate progress in the development of 
analytical techniques for the measurement of software quality 
and the prediction of software reliability. Th7S paper 
presents a rationale for implementing software reliability 
programs; defines software reliability; and describes some of 
the problems Oi performing s ware reliability analysis. A 
software reliability program is outlined and a methodology for 
reliability prediction and quality control is presented. The 
results of initial efforts to develop a software reliability 
methodology at the Naval Electronics Laboratory Center are 
reported. 


Shoomt-n, M. L. and Natarajan, S. "Effect of Manpower Deployment 
and r 'or Generation on Software Reliability." Proceedings of 
the Lj^mposium on Computer Software Engineering XXIV, MRI 
Symposia Series, Polytechnic Press, Brooklyn, NY (1976) . 

Shooman, Martin L. : "Operational Testing and Software 

Reliability Estimation During Program Development", Record, 

IEEE Symposium on Computer Software Reliability, 1973. 

Abstract: This paper discusses some quantitative 

models which can be used to measure, manage, and predict the 
level of perfection (frviedora from bugs) of software during the 
development and test stages. The measures used are the 
reliability function, R(t) , and the mean time between software 
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failures, MTTF, both of which improve as more resources (time, 
man-hours, computer-hours) are expended on the program. The 
methodology described is most applicable to the last (but 
extensive) phase of software development generally called test 
and integration. 

In order to calculate the MTTF and R(t) one needs test 
data on the system, or since we wish to predict, on a 
preliminary version of the system. The obvious choice is the 
succession of updated versions of the software produced during 
system integration. It is proposed that the functional 
software test program (system exerciser) written to test all 
large software systems be used to generate this data. The only 
additional efforts required over a normal test program to 
obtain the necessary data are; (a) careful post-analysis of 
test results to segregate hardware, software, and operator 
errors, and (b) running of the functional test occasionally 
during the entire system integration phase rati.er than just at 
the end. 

A plot of the MTTF versus time yields a growth curve. 
Once several points on the curve have been established the 
future behavior (during test and integration and immediately 
after program release) can be predicted by extrapolation. 

Unless a technique well suited to the physical problem is used, 
extrapolation can be very misleading. A much better technique, 
requiring fewer data points for=r. the same prediction accuracy, 
is to postulate an underlying model for error removal and use 
the test data to estimate the model constants. The error model 
used in this paper is based on previous work relating R(t) and 
MTTF to debugging data. The number of errors remaining in a 
software program is probabilistically modeled in terms of the 
number of errors corrected, the program size and the initial 
number of errors. An additional assumption is made that the 
software failure rate (crash rate) is proportional to the 
number of remaining errors. This allows one to write an 
expression for the software reliability and the mean time to 
software failure. To evaluate the two constants in the model, 
it is necessary to collect test data of the type previously 
described at a minimum of two separate points in the test and 
integration phase. If data is taken at more than two points 
the additional data sets may be used to study the consistency 
of the parameters and validate or suggest changes in the basic 
model. If the model is validated and the paired parameter 
estimates are consistent, then the data at the several test 
poinv-s can be used for a pooled estimate. 
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Shoonan, Martin L.: "Probabilistic Models for Software 

Reliability Prediction", Statistical Coaputer Performance 
Evaluation, Freiberger (Ed.), Academic Press, New York, 1972. 

Abstract: With the advent of large sophisticated 

hardware-software systems developed in the 1960s, the problem 
of computer system reliability has emerged. The reliability of 
computer hardware can be modeled in much the same way as other 
devices using conventional reliability theory; however, 
computer software errors require a different approach. This 
paper discusses a newly developed probabilistyic model for 
predicting software reliability. The model constants are 
calculated from error data collected from similar previous 
programs. The calcul-jitions result in a decreasing probability 
of no software errors versus operating time (reliability 
function) . The rats at which reliability decreases is a 
function of the man-months of debugging time. Similarly, the 
mean time between operational software errors (MTBF) is 
obtained. The MTBF increases slowly and then more rapidly as 
the debugging effort (man-months) increases. The model permits 
estimation of software reliability before any code is written 
and allows later updating to improve the accuracy of the 
parameters when integration or operational tests begin. 

Shocmian, M. L., "Software Reliability: Measurement and 

Models", 1975 Annual Reliability and Maintainability Symposium, 
Washington, DC, Jan 28-30, 1975. 

Abstract: With the advent of large sophisticated 

hardware-software systems developed in the 1960s, the problem 
of computer system reliability has emerged. The reliability of 
computer hardware can be modeled in much the same way as other 
devices using conventional reliability theory; however, 
computer software errors require a different approach. 

The paper begins by describing the types and causes of 
software errors and provides working definitions of software 
errors and software reliability. Some of the basid data on 
frequency of occurrence of errors is then discussed. The 
paper then summarizes and references some of the software 
reliability models which have been proposed and concentrates on 
one developed by the author. 

This newly developed probabilistic model predicts 
reliability based on the initial number of errors in a program, 
the number removed, and the number remaining is the program. 

The model constants are calculated from operational test data 
on the software performance. 

The calculations result in a decreasing probability of 
no software errors versus operating time (reliability 
function). The rate at which the reliability decreases is a 
function of the man-months of debugging time. Similarly, the 
mean time to occurrence of operational software errors (MTTF) 
is obtained. The MTTF increases slowly and then more rapidly 
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as the debugging effort (man-months) increases. The model 
permits estimation of software reliability before any code is 
written and allows later updating to improve the accuracy of 
the prediction when integration or operational tests begin. 

Shooman, M. , et al, "Summary of Technical Progress Software 
Modeling Studies", Polytechnic Institute of New York, 
RADC-TR-75-245, Interim Report, Jun 1975 (Under RADC Contract 
F30602-75-C-0294) (AD A018 G18) . 

Abstract: During the period of time of 1 October 1974 

to 30 June 1975, Polytechnic Institute of New York conducted 
research uinder RADC contract F30602-74-C-0294 in the area of 
software reliability. This report presents the progress of 
this research. Subjects of investigation were Markov models 
for the prediction of software availability, theortetical 
models for software testing, automatic programming, automatic 
testing of programs and collection of error data, estimation of 
the initial number of program errors, program complexity and 
hierarchies of computable functions. 

Research into the use of Markov models for prediction 
of software availability has been completed and a report 
RADC-TR-75-159, "Computer Software Reliability: Many-State 

Markov Modeling Techniques," has been published covering this 
topic. This technique involves using a statistical model to 
predict rhe future performance of software using past 
performance data. 

Theoretical models have been studied concerning 
software testing fot use in determining the minimum number of 
tests that are necessary to verify that a program has been 
completely tested. This involves determining the paths that 
are contained in the program and the number of tests necessary 
to test each path. 

The seedinoi and tagging approach for estimating the 
number of software errors in a program has been investigated 
and experiments have been planned to verify this approach. 

This method of estimating the initial error content of a 
program involves several people debugging the same program. 

The total number of errors are then statistically determined 
using the number of errors found by each person that are 
contained in common with a "tagged" set of errors. 

The possibility of reducing chances for program errors 
by matching the power of the programming language to the 
complexity of the problem being solved is being addressed by 
the investigation of hierarchies of computable functions 
defined bvy substitution and recursion. This research relates 
to the extension of basic automata theory to set up degrees of 
difficulty in computation and to adapt the schemata provided by 
recursive function theory to programming in higher level 
languages with more useful data types. 
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Sukert, A. N. "A Software Reliability Modeling Study* Rome Air 
Development Center (ISIS) Griffis Air Force dase^ NY 
RADC-TR-76-247 Aug 1976 (AD A030437) . 

Thayer, T. A., et al.. Software Reliability Study, Final 
Technical Report, 76-2260.1.9-5, TRW Defense and Space Systems 
Group, One Space Park, Redondo Beach, CA, Contract No. 
F30602-74-C-0036 (19 March 1976). 

Abstract: A study of software errors is presented. 

Techniques for categorizing errors according to type, 
identifying their source, and detecting them are discussed. 
Various tec’ niques used in analyzing empirical error data 
collected from four large software systems are discussed and 
results of analysis are presented. Use of results to indicate 
improvements in the error prevention and detection processes 
through use of tools and techniques is also discussed. 

A survey of software reliability models is included, 
and recent «rork on TRW's Mathematical Theory of Software 
Reliability (MTSR) is presented. 

Finally, lessons learned in conjunction with 
collecting software data are outlined, with recommendations for 
improving the data collection process. 

Thompson, W. and Walsh, D. "Reliability and Confidence Limits 
for Computer Software," General Research Corporation Report. 

Trauboth, H., "Guidelines for Documentation of Scientific 
Software Systems," Proc. 1973 IEEE Symp. Comput : Software 
Reliability, Brooklyn Poly. Inst., April 1973, pp. 124-131. 

Tribus, Myron; Pitts, Gary: "The Widget roglero Revisited", 

IEEE Trans, on Systems Science and Cybernetics, Volume SSC-4, 
No. 3, September 1968, 241-248. 

Abstract: The Jaynes "widget problem" is reviewed as 

an example of an application of the principle of maximum 
entropy in the making of decisions. The exact solution yields 
an unusual probability distribution. The problem illustrates 
why some kinds of decisions can be made intuitively and 
accurately, but would be difficult to rationalize without the 
principle of maximum entropy. 

Trivedi, A. K. and Shooman, M. , "Computer Software 
Reliability: Many-State Markov Modeling Techniques", 

Polytechnic Inst, of New York, RADC-TR-75-169, Interim Report, 
Jul 1975 (Under RADC contract F30602-74-C0294, Software 
Modeling Studies) . (AD AO014824) . 

Abstract: Many-state Markov models have been 

developed for the purpose of providing quantitative reliability 
criteria for computer software. The software system under 
consideration is assumed to be large, so that statistical 
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deductions become meaningful, and is assumed to initially 
contain an unknown number of bugs. The basic models provide 
estimates and predictions for a quantifier that represents the 
state of debugging of the system and which is generally the 
most probable number of software errors that will have been 
corrected at a given time in the operation of this software 
system based upon preliminary modeling of the error occurrence 
rate and the error correction rate. The models also provide 
predictions for the availability and for the reliability of the 
system. The differential equations corresponding to the basic 
many-state Markov models are solved for verification and 
demonstrative purposes. 

Manufacturer's data have been obtained on this 
performance of system software for a medium-sized software 
operating system. These data have been analyzed to obtain 
frequency distributions of the random variables representing 
the time to close software error reports. The data are then 
used for application of the basic many-state Markov model. A 
general discussion of error data collection is undertaken in 
some detail, and suggestions are made for possible improvements 
in software error data documentation practices. 

Various extensions and modifications of the basic 
many-state Markov models are discussed. The classes of the so 
called many-state Markov G-Models and H-Models are developed to 
handle, respectively the case of arbitrary degress of system 
degradation and the case of various categories of system "down" 
states. The solutions and results of some of these cases are 
presented. Finally, the computational efficiency and tradeoffs 
involved in the solutions of the many-state Markov models are 
disucssed. 

Tsi ‘-.ihritzis, D. and Ballard, A. "Software Reliability," 

Il^iOR , Vol. 11, No. 2, June 1973, pp. 113-124. 

Abstract: Our approach assumes that there is 

increasing interest in both practical and theoretical aspects 
of the reliability of computer software, and this paper reviews 
many aspects of software design and production which affect 
reliability. For the most part, the topics are discussed 
relative to simple examples, and with reference to the previous 
work of others; however, a new approach to formally proving 
sysv'.m correctiness is presented. The system can be 
•■^“presented at any instance of time by its state. The progress 
OiT the system is represented by a state history. Any property 
can therefore be described as a relation between states. The 
correctness proof is an induction with respect to the sequence 
of such states followed during execution. The paper also 
covers, in review, program design, protection, programming 
style, testing and other topics. 


D-18 



Wagoner, W. L. , "The Final Report on a Software Reliability 
Measurement Study", The Aerospace Corp. , Report No 
TOR-0074(4112)-1, Aug 15, 1973. 


Software 
author . 

Abstract: This report presents the final results of a 

Reliability Measurement Study performed by the 
The objectives of the study were as follows: 


1. 

To establish a rudimentary definition of software 
reliability. 


2. 

To identify parameters affecting software failure 
rates (e.g., program size, difficulty, programmer 
experience, schedule, etc). 


3. 

To determine the critical parameters required for 
a software reliability model, including the 
distribution of software errors as a function of 
time . 

The report includes: 


1. 

A definition of terms relative to software 
reliability. 


2. 

A section discussing software error detection 
rates and parameters which affect this process. 


3. 

A summary of existing models and a comparison 
with a model proposed by the author. 


4. 

An annotated list of ereferences on software 
reliability. 
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